A nonconvex TVq-l1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox{TV}_q-l_1$$\end{document} regularization model and the ADMM based algorithm

The total variation (TV) regularization with l1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_1$$\end{document} fidelity is a popular method to restore the image contaminated by salt and pepper noise, but it often suffers from limited performance in edge-preserving. To solve this problem, we propose a nonconvex TVq-l1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox{TV}_q-l_1$$\end{document} regularization model in this paper, which utilizes a nonconvex lq\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_q$$\end{document}-norm (0<q<1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(0<q<1)$$\end{document} defined in total variation (TV) domain (called TVq\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox{TV}_q$$\end{document} regularizer) to regularize the restoration, and uses l1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_1$$\end{document} fidelity to measure the noise. Compared to the traditional TV model, the proposed model can more effectively preserve edges and contours since it provides a more sparse representation of the restoration in TV domain. An alternating direction method of multipliers (ADMM) combining with majorization-minimization (MM) scheme and proximity operator is introduced to numerically solve the proposed model. In particular, a sufficient condition for the convergence of the proposed algorithm is provided. Numerical results validate the proposed model and algorithm, which can effectively remove salt and pepper noise while preserving image edges and contours. In addition, compared with several state-of-the-art variational regularization models, the proposed model shows the best performance in terms of peak signal to noise ratio (PSNR) and mean structural similarity index (MSSIM). We can obtain about 0.5 dB PSNR and 0.06 MSSIM improvements against all compared models.

Images are often contaminated by additive noise during the formation, transition or recording process, usually modeled as: where u is the original true image, f is the corresponding noisy version, and n represents additive noise. Solving the u from the linear system (1) is a classical inverse problem, which is actually an ill-posed problem since the solution of u is non-unique and is very sensitive to the initialization. A nature method to address this problem is regularization technique and functional minimization by introducing some prior informations on the restorations [1][2][3][4][5] , usually formulated as: where R(u) is regularization term that embodies the priors, F(f − u) is the fidelity term that forces the closeness of the restoration u to the observation f, U is a function space modeling the restoration u, and > 0 is a tuning parameter that controls the tradeoff between the two terms.
For the regularization term, the earliest regularizer is Tikhonov regularization term proposed by Phillips 6 and Tikhonov 7 in 1960s, which is defined as a quadric functional of the l 2 norm of |∇u| , i.e., �∇u� 2 2 . Tikhonov regularization has the strong ability of noise removing. However, it often overly smoothes the image edges. Rudin, Osher and Fatemi in 1992 8 proposed total variation (TV) regularization to address this over-smoothing problem. The function measured by TV allows for discontinuities along curves during the functional minimization, therefore edges and contours can be preserved in the restoration u. Later on, many scholars have done a lot of research on TV regularization, and proposed lots of improved TV-based regularization terms, such as high-order TV 9,10 , hybrid TV 11,12 , non-local TV 13,14 , overlapping TV 15,16 , anisotropic TV 17 . We note that the TV-based regularizers mentioned above are convex functional.
(1) f = u + n, In the last decades, nonconvex regularization based on sparse priors has attracted much attention and found wide applications. It is based on the observation that signals (also images) usually have the very sparse representation in some transformed domains (such as Fourier transform, cosine transform), or in some dictionaries (such as wavelet dictionary, framelet dictionary, self-adaptive dictionary) 18,19 . It is well known that l 0 -norm measured by the number of nonzero entries is the exact measurement of the sparsity. However, it is difficult to be solved in the practice. A popular method to attack this problem is to use the l 1 -norm as a relaxation measurement, which is a convex functional and makes the problem easier to solve. It has been shown that under some assumptions, the regularization problems with such l 1 relaxation leads to a near optimal sparse solution. To further encourage the sparsity of the solutions, some nonconvex regularizers are proposed since nonconvex functions are much closer to the l 0 -norm than convex counterparts 20,21 . Since the seminal work of Geman and Geman in 22 , various nonconvex regularization models have been proposed, such as [23][24][25][26][27] . Although nonconvex optimization problems cannot guarantee the existence and uniqueness of the solution, and will lead to complex numerical calculation, a variety of applications (e.g., [28][29][30][31] ) have shown that nonconvex regularization models outperform the convex counterparts, and yield the restorations of high quality with sharp and neat edges. In addition, Nikolova et al. 25,30 provided a theoretical explanation for this phenomenon.
For the fidelity term, one always uses the l 2 -norm �f − u� 2 2 to measure the closeness between the restoration u and the observation f [9][10][11][12][13][14][15][16] . It is well known that such least-squares fitting using l 2 -norm yields the mean filtering, which is only suitable for removing the additive Gaussian noise, but fails for salt and pepper noise. While the least-absolute fitting using l 1 -norm leads to the median filtering that is less sensitive to the outliers. So l 1 -norm fidelity term �f − u� 1 is suitable for removing the salt and pepper noise. A lot of regularization models with l 1 -norm fidelity have been proposed for salt and pepper noise removal, such as 3,26,[32][33][34] . In addition, Chan and Esedoglu in 34 demonstrated that TV regularization with l 1 -norm fidelity term (TVL1) is contrast invariant, as opposed to that with l 2 -norm fidelity term. However, TVL1 model has limited performance in edge-preserving due to the use of the convex TV regularizer. We note in passing that Meyer in 35 suggested to use some weakerthan-l 2 (also l 1 ) norms as the fidelity term to measure the residual. He introduced three functional spaces, G, E and F, to model the oscillatory functions, which are very suitable for image cartoon-texture decomposition, but not suitable for salt and pepper noise removing.
Based on the above analysis, we note that: (1) TV regularization with l 1 fidelity can successfully remove salt and pepper noise, but lacks the ability of edge-preserving; (2) Although nonconvex regularization can preserve image edges well, few studies concern on salt and pepper noise removal. In order to solve these problems, and effectively remove salt and pepper noise while better preserving image edges and contours, a nonconvex TV q − l 1 regularization model is proposed in this paper. It utilizes a nonconvex TV q regularizer defined in TV domain to model the restoration u, and employs the l 1 -norm as the fidelity term for the noise f − u . So, the proposed model can remove the salt and pepper noise while preserving image edges and contours due to the combination of nonconvex regularization and l 1 -norm fidelity term. A first-order algorithm based on the alternating direction method of multipliers (ADMM) combining with MM scheme proximity operator is developed to numerically solve this nonconvex model. In addition, a sufficient condition for the convergence of the proposed algorithm is provided. The main contributions of this work are as follows: • A nonconvex TV q regularization variational model with l 1 -norm fidelity is proposed. Although much research has been done on the nonconvex regularization and l 1 -norm fidelity term separately, to the best of our knowledge, there are every few studies on the issue of the combination of the nonconvex regularization and l 1 -norm fidelity. A few recent works can be seen in 3,26,32 . We note that the nonconvex regularizers in these literatures are defined in the image domain itself, or in the coefficient domain on a basis, whereas our regularizer is in TV domain. Compared with models in 3,26,32 , nonconvex TV regularization has superior performance in edge-preserving, we refer the readers in 25,30 for more details. • A first-order algorithm based on ADMM combining with proximity operator is introduced for the nonconvex model. In addition, the convergence property of the proposed algorithm is analyzed. We note that for the "nonconvex regularization + l 1 -norm fidelity" models, the authors in 3,26,32 also used ADMM framework. But 3 did not give a convergence analysis. The authors in 26,32 derived a convergent algorithm by smoothing the l 1 -norm fidelity term. Different from the methods in 26,32 , we give a convergence analysis under some nature assumptions to the proposed functional and the parameters.
The structure of this paper is organized as follows. "Introduction" section presents the background and start of this study. "Related work" section gives some background knowledge involving TVL1 model, nonconvex regularization and proximity operator. "Methods" section details the proposed nonconvex TV q − l 1 regularization model, and introduces an efficient numerical algorithm for the proposed model. A sufficient condition for the convergence of the proposed algorithm is also provided in this section. "Results" section discusses the performance of the proposed model and algorithm. "Conclusion" section presents the results. The work ends with concluding remarks.

Related work
In this section, we recall some background knowledge that are very related to our present work, where TVL1 model is the seminal work for salt and pepper noise removal under the variational regularization framework, nonconvex regularization provides the design of regularizer for the proposed model in this paper, and proximity operator is used to solve the nonconvex subproblem in the ADMM algorithm. www.nature.com/scientificreports/ TVL1 model. Rudin et al. 8 proposed the following TV regularization model with l 2 -norm fidelity term (TVL2) to address over-smoothing problem often arising in the Tikhonov regularization, where �∇u� 1 = |∇u|dx is the regularization term, and �f − u� 2 2 is l 2 -norm fidelity term. Model (2) is convex with respect to u and easy to be solved in the practice. TVL2 model (2) is suitable for Gaussian noise removing. In addition, TV energy does not penalize the discontinuity of the functions along the contours, so the edges can be preserved in the restoration u by model (2).
Chan and Esedoglu in 34 proposed the following TV regularization model with l 1 -norm fidelity term (TVL1), TVL1 model (3) is more suitable for salt and pepper noise removing than TVL2 model (2). In addition, compared to TVL2, TVL1 model is contrast invariant. The authors in 34 gave a simple but illustrative example to show the characteristics of the solutions of TVL1 model and TVL2 model. Assuming the observed image f(x) being a characteristic function 1 B r (0) (x) of a disk B r (0) that is centered at the origin and with radius r, they derived the close-form solutions of (2) and (3). The solution of TVL2 model (2) can be written as: Assuming the minimizer of TVL1 model (3) has to be of the form c1 B r (0) (x) for some constant c ∈ [0, 1] , they get the solution as: From (4) and (5), we observe that both disks in the TVL2 and TVL1 solutions vanish if the radiuses are less than 2/ . But for the disks whose radiuses r are greater than 2/ , TVL1 model preserves these disks intactly, i.e., u TVL1 = f , in contrast to the "contrast loss" phenomena in TVL2 model, where the loss is inversely proportional to r . This intuitive example indicates that TV regularization with l 1 -norm fidelity can better preserve the contrast of the images than that with l 2 -norm fidelity in the application of image restoration.
Nonconvex regularization. From the view of sparse-representation, TV energy is actually the l 1 -norm of the gradient module, which can be seen as a relaxation of the l 0 -norm that is the accurate measurement of the sparsity. To promote the sparsity of the entries, nonconvex measurement is a good candidate since it approximates the l 0 -norm more closely than l 1 -norm. Nikolova et al. 25,30 proposed the following nonconvex TV regularization model with l 2 -norm fidelity term, which is called NTVL2 model in the following: where ϕ(t) is a nonconvex potential function, and �ϕ(|∇u|)� 1 = � ϕ(|∇u|)dx is the nonconvex regularization term. Since nonconvex function ϕ(t) is closer to the l 0 -norm than l 1 -norm, NTVL2 model (6) can obtain the more sparse representation of |∇u| than TVL2 model (2). Furthermore, compared to the TVL2 model, NTVL2 model encourages the penalty to the pattern of small variation, while decreases the penalty to the pattern with large variation. So, NTVL2 model has a superior performance in noise-removing and edge-preserving than classical TVL2 model. However, NTVL2 model (2) is only suitable for Gaussian noise removing due to the use of l 2 -norm fidelity. To achieve the sparse recovery in the presence of salt and pepper noise, recently some nonconvex regularization models with l 1 -norm fidelity have been proposed 3,26,32 , called NRL1, which are defined as follows: where P(·) is a nonconvex function for sparsity promotion. If A is a identity matrix, model (7) is to recover the sparse image u. If A is a sensing matrix accumulated by a basis, model (7) is to recover the image Au which has the most sparse representation on this basis. We note that the nonconvex regularizer in (7) is defined in the image domain itself, or in the coefficient domain on a basis. To inherit the advantages of nonconvex TV regularization in image restoration, we propose a generalized nonconvex regularization variational model for salt and pepper noise removal. It utilizes a generalized nonconvex regularizer defined in the TV domain as the priors to model the restorations, and employs the l 1 -norm as the fidelity term to measure the noises. New model can effectively remove salt and pepper noise due to the use of l 1 -norm fidelity; and well preserve image edges and contours due to the use of nonconvex TV regularization.
Proximity operator of l q function. The proximity operator is a generalized form of the projection operator, often used to solve non-differentiable optimization problems. In this paper, we use it to solve the nonconvex www.nature.com/scientificreports/ subproblems in the iterative algorithm. For a proper and lower semi-continuous function P(x), the corresponding proximity operator is defined as 36,37 , Intuitively, proximity operator prox P,ρ (t) is to approximate the point t with some other point x under the norm �x − t� 2 2 and the penalty P(x). The positive parameter ρ > 0 is introduced as a means to control the approximation. In the following, we review the proximity operator for l q (0 < q < 1) function, which will be used in our numerical implementation.
When the penalty is given as l q -norm (0 < q < 1) , i.e., the proximity operator does not has a closed-form expression except for two special cases of q = 1/2 and q = 2/3 . When P(x) = |x| 1/2 , the corresponding proximity operator is a l 1/2 thresholding function 38,39 , where When P(x) = |x| 2/3 , the corresponding proximity operator is a l 2/3 thresholding function 39,40 , where and For any other q, the authors in 41 give a semi-implicit expression of the proximity operator with l q -norm penalty, which is defined as: In (10), the threshold τ satisfies that τ = β + qβ q−1 /ρ with β = 2(1 − q)/ρ 1 2−q , and y * is the shrinkage that has not explicit expression. It is a zero point of the non-linear function h(y) = qy q−1 + ρy − ρ|t| over the region (β, |t|).

Methods
The main purpose of this paper is to effectively remove salt and pepper noise while successfully preserve image edges and contours in image restoration under the variational framework. Firstly, a variational regularization model combining nonconvex regularization and l 1 fidelity is proposed, which is actually a minimization problem. And then, the classical ADMM algorithm is developed to numerically solve the proposed model, which is programmed by MATLAB software in the experiments. Finally, some commonly used test images and datasets are used to validate the proposed model and algorithm. The PSNR and MSSIM indexes are used as the means to quantitatively evaluate the performance. Figure 1 shows a flow chart to clarify the study design of the present work.
The proposed nonconvex TV q − l 1 regularization model. In this section, we present a nonconvex TV q − l 1 regularization variational model, called NTVL1 in the following, which is defined as: www.nature.com/scientificreports/ where P(Bu) is the regularization term, in which P(·) : R + → R + is a continuous, increasing and nonconvex l q function for sparsity promotion, and B is the gradient operator |∇| = (∇ 2 x + ∇ 2 y ) 1/2 . We note in passing that B can be choose some other difference operators, such as x-directional difference ∇ x , y-directional difference ∇ y , and anisotropic difference operator ∇ x + ∇ y . The l 1 -norm �f − u� 1 is the fidelity term. U is a function space (e.g., Sobolev space, bounded variation space). And is a positive tuning parameter, which balances the regularization term and fidelity term. Model (11) combines the advantages of nonconvex TV regularization and l 1 -norm fidelity. It can effectively remove the salt and pepper noises by l 1 -norm fidelity term, while preserving the valuable edges and contours via nonconvex TV regularization.

Remark 1
Although much research has been done on the nonconvex regularization and l 1 -norm fidelity term separately, to the best of our knowledge, there are every few studies on the combination of them. A few works can be seen in 26,32 , called NRL1, which are defined as follows: where P(u) is a nonconvex regularization term to measure the sparsity of u, and A is a transformation matrix. If A is a identity matrix, NRL1 model is to recover the sparse image u, and if A is a sensing matrix accumulated by a basis, it recovers the image Au that has the most sparse representation on the basis A. But differed from NRL1 regularizer P(u), our nonconvex regularizer is P(Bu) that is defined in TV domain. In image restoration application, such scheme can better preserve edges and contours than NRL1.

Remark 2
The model (11) can effectively preserve edges and contours in the restoration u due to the use of nonconvex TV regularization term. Using a basis of a local framework (N, T), where N is normal direction defined as N = ∇u/|∇u| , and T is the corresponding tangent direction defined as T = ∇u ⊥ /|∇u| , we derive the Euler-Lagrange equation associated with (11), where u TT and u NN are the second derivatives of u in T and N directions, and P ′ (|Bu|)/|Bu| and P ′′ (|Bu|) can be seen as the adaptive diffusion velocity along T-direction and N-direction, respectively. It is obviously that P ′ (t) is a monotony decrease function and satisfies P ′′ (t) < 0 since P(t) is a nonconvex increasing function.
Along T-direction, for the image pixels where |Bu| ≈ 0 (homogeneous regions), the diffusion Eq. (12) has strong smoothing effect since the diffusivity P ′ (|Bu|)/|Bu| is of a large value. And for the image pixels where the value of |Bu| is large (edges), the model (12) has weak smoothing effect since the value of diffusivity P ′ (|Bu|)/|Bu| is small. Along N-direction, the adaptive diffusivity always satisfies P ′′ (|Bu|) < 0 for each image pixel, which means that diffusion in normal direction is always reverse. Based above, we can conclude that the proposed model can effectively smooth the image homogeneous regions, while still preserving the edges and contours very well. www.nature.com/scientificreports/ The proposed algorithm. Obviously, model (11) is a nonconvex and nonsmooth optimization problem since the first term is nonconvex, and the second term is nonsmooth. In this section, we propose an efficient first-order algorithm to solve this model using ADMM framework. ADMM algorithm decouples the variables and makes the global problem easy to tackle, which is very suitable to solve the distributed optimization and high-dimensional optimization problems. With the use of an auxiliary Bu = v , ADMM algorithm is to solve the following linearly constrained reformulation of (11): Transforming (13) into an augmented Lagrangian formulation, we obtain where u and v are primal variables, p is the Lagrangian multiplier, also called dual variable, and ρ > 0 is a penalty parameter. Functional (14) can be simplified as: where C = 1 2ρ �p� 2 2 that can be neglected in the minimization problem. Then, we alternatively minimize (15) with respect to u and v, and then update the multiplier p. Specifically, the minimization solutions (u k+1 , v k+1 ) are obtained alternatively while the other variables are fixed, which leads to the following iteration scheme.
• Step 1. Fixing variables v and p, we minimize the energy L(u, v; p) with respect to u.
• Step 2. Fixing variables u and p, we minimize the energy L(u, v; p) with respect to v.
• Step 3. Updating Lagrangian multiplier p as follows: ADMM algorithm solves the original model (11) by alternatively updating the above steps. In the following, we solve the subproblems (16) and (17) in detail.
Solve the subproblem (16) with respect to u. Using an auxiliary variable w = u − f , we convert the minimization problem (16) into an equivalent form, Then, the optimal u k+1 can be computed by u k+1 = f + w k+1 . The w-subproblem (19) is actually l 1 -regularized least squares problem. We use a majorization-minimization (MM) scheme to solve this subproblem approximately. Specifically, let n k = v k − Bf − p k ρ , we majorize the quadratic functional �Bw − n k � 2 2 in the objective functional (19) with a simple surrogate functional by linearizing it at point w k , where d w k is the gradient of the quadratic functional �Bw − n k � 2 2 at point w k , computed by d w k = B T (Bw k − n k ) , and τ > 0 is a proximal parameter. Using such an approximation of Bw − n k 2 2 in (19), we denote the new energy as F(w, w k ) . Obviously, when the proximal parameter τ satisfies 1/τ > max B T B , where max B T B denotes the maximum eigenvalue of the matrix B T B , the new energy F(w, w k ) satisfies the classical MM conditions: (i) F(w, w k ) ≥ F(w) for all w, and (ii) F(w k , w k ) = F(w k ) . Minimizing the surrogate energy F(w, w k ) in stead of the original energy F(w), and neglecting the constant in the F(w, w k ) , we obtain the following minimization problem,  (20) is a classical l 1 + l 2 minimization problem, which can be explicitly solved by a soft thresholding with the shrink operator, i.e., where shrink operator is defined as: Solve the subproblem (17) with respect to v. Let m k = Bu k+1 + p k ρ , the v-subproblem (17) can be computed by the proximity operator, i.e., where prox P,ρ is the proximity operator for the function P(·) with penalty ρ.
The above states the algorithm that solves the augmented Lagrangian formulation L(u, v; p) with a fixed k. At last, in order to incorporate the algorithm into ADMM framework to solve the original nonconvex model (11), starting with the initial assignment as k = 0 , u 0 = f , v 0 = Bf and p 0 = 0 , we reiterate the above computing processes, each time updating the value of k as k + 1 . Consequently, the ADMM algorithm to our nonconvex variational model (11) is written as follows (Algorithm 1).
We note that for the Algorithm 1, it only needs one loop to iteratively update the function values. The computation load in each iteration is matrix multiplication. So the complexity of the Algorithm 1 is O(mn), where m is the size of the input images, and n is the number of the loop iterations.

Convergence analysis.
In this subsection, we analyze the convergence property of the Algorithm 1. Note that the convergence issue of ADMM algorithm for the convex models has been well addressed, such as [42][43][44][45] , while there are very few studies on the nonconvex cases. Inspired by the approaches and conclusions in 46,47 , we derive the following results for convergence of the Algorithm 1. Firstly, several assumptions are introduced, which will be used in the following convergence analysis. Assumption 1 Function P(·) is closed, proper and lower semicontinuous.

Assumption 2
The gradient of P(·) is Lipschitz continuous, i.e., for any x and y, there exists a positive constant K > 0 , such that: We note that here we use gradient ∇ rather than derivative since gradient is a generalization of the derivative.

Assumption 3
The penalty parameter ρ is chosen large enough such that ρ > K . In this case, the v-subproblem (17) is strongly convex. www.nature.com/scientificreports/ Assumption 4 The energy E(u) is bounded below, i.e., E = min E(u) > −∞.
We first show that the difference of the dual variable p in the iteration can be bounded above by that of the primal variable v. Lemma 1 Let (u k , v k ; p k ) be the sequence obtained by Algorithm 1, then we have following: Proof From the v update step (17), we have the following optimality condition: Combining with the dual variable update step (18), i.e.,

We have
By the assumption that the gradient of P is Lipschitz continuous, we have The desired result is obtained.
Next, we show that the augmented Lagrangian function L(u, v; p) is monotonically decreasing in the iterative process.
Lemma 2 Let (u k , v k ; p k ) be the sequence obtained by Algorithm 1, then we have following: where

Proof In (19), let
In w-subproblem (20), we actually minimizes the following approximated objective of (19), Because w k+1 is the minimizer of F w, w k , we have which implies that: where max B T B denotes the maximum eigenvalue of the matrix B T B . The inequality (24) combining with (25) yields: The desired result is obtained.

Lemma 3
Let (u k , v k ; p k ) be the sequence obtained by Algorithm 1, then we have following: where γ 2 is a positive constant associated with ρ.
Proof By the assumption, ρ > K implies that L u k+1 , v; p k is strongly convex respect to the variable v. So, we can deduce that there must exist a positive constant γ 2 (ρ) such that: Because v k+1 is a minimizer of L u k+1 , v; p k , we have It follows from (27) and (28) that: The desired result is obtained.

Lemma 4
Let (u k , v k ; p k ) be the sequence obtained by Algorithm 1, then we have following: Proof We first split the difference of the augmented Lagrangian function by: The first term in right side of (29) can be computed by: The second term in right side of (29) can be split as: which together with Lemmas 2 and 3 yields: Combining Eqs. (33) and (34), we obtain The desired result is obtained.
Lemma 4 implies that if the condition ργ 2 > 2K 2 is satisfied, then which implies that the value of the augmented Lagrangian function will always decrease with the iteration progressing. We note that as long as parameter γ 2 = 0 , one can always find a suitable ρ large enough such that the condition ργ 2 > 2K 2 is satisfied, since ργ 2 is monotonically increasing with respect to ρ , and 2K 2 is a constant associated with the function P(·).
Lemma 5 Let (u k , v k ; p k ) be the sequence obtained by Algorithm 1, then we have following: where E is the lower bound of E(u) defined in Assumption 4.
Lemma 4 shows that the augmented Lagrangian function L u k , v k ; p k is monotonically decreasing, and Lemma 5 shows that L u k , v k ; p k is bounded below. So, we can conclude that the augmented Lagrangian function L u k , v k ; p k is convergent as k → ∞.
Theorem 1 Let (u k , v k , p k ) be the sequence obtained by Algorithm 1, suppose that ρ > K , ργ 2 > 2K 2 and 1/τ > max (B T B) , then we have following: If U is a compact set, then the sequence z k = (u k , v k , p k ) converges a limit point z * = (u * , v * , p * ) . In addition, z * is a stationary point of the augmented Lagrangian function L(u, v; p).
Proof We first prove part (i) of the theorem. By Lemmas 4 and 5, we can conclude that the augmented Lagrangian function L u k , v k ; p k is convergent as k → ∞ , which implies that: By Lemma 4, we have Since ργ 2 > 2K 2 and γ 1 > 0 , taking limit for (40), and combining (39), we have By Lemma 1, we further obtain With the fact that p k+1 = p k + ρ Bu k+1 − v k+1 , using (41), we have which implies that: Next, we prove part (ii) of the theorem. We first show that there exists a limit point for the sequence z k = (u k , v k , p k ) . Since U is a compact set, and lim k→∞ u k+1 − u k 2 2 = 0 , there must exist a convergent subsequence u k i 1 of u k such that u k i 1 → u * . Since B is a bounded linear operator, and U is a compact set, we can deduce that the map set BU = {v : Bu = v, u ∈ U} is also a compact set. With the fact that lim k→∞ Bu k − v k 2 2 = 0 and lim k→∞ v k+1 − v k 2 2 = 0 , we can deduce that v k also lies in the compact set, and exists a convergent subsequence v k i 2 such that v k i 2 → v * . Note that ∇P(v) is Lipschitz continuous, and BU is a compact set, we can deduce that ∇P(v)(v ∈ BU) is bounded, which implies that ∇P(v k ) is a bounded sequence. With the fact that p k = ∇P(v k ) and lim k→∞ p k+1 − p k 2 2 = 0 , there must exist a convergent subsequence p k i 3 such that p k i 3 → p * , Selecting the same indexes from Next, we show that any limit point of the sequence z k is the a stationary point of the augmented Lagrangian function L (u, v, p). By the optimality conditions, the sequence z k = (u k , v k , p k ) satisfies that: Since z k i → z * = (u * , v * , p * ) as k i → ∞ , passing to the limit in (42) along the subsequence z k i , we obtain (37) P(v k+1 ) + ∇P(v k+1 ), Bu k+1 − v k+1 P(Bu k+1 ). www.nature.com/scientificreports/ which implies that z * = (u * , v * , p * ) is a stationary point of the augmented Lagrangian function L (u, v, p). The desired result is obtained.

Results
In this section, we show the effectiveness of the proposed model and algorithm in image denoising application. The programs are coded in MATLAB, and run on a PC with Intel Core i5 2.5G CPU and 4.00G RAM. The peak signal to noise ratio (PSNR) and mean structural similarity (MSSIM) index 48 are used as the means of judging the performance. The main experimental content of this paper is as follow: (1) The effectiveness of the proposed model, and the convergence of the algorithm.
(2) The effect of the nonconvex parameter q in the proposed model.
In all experiments, the difference operator B in the model (11) is chosen as the gradient operator |∇| . Then, P(|∇u|) is the nonconvex total variation measure of the input u. Here, we give the definition of ∇ in the discrete case. Rearranging the two-dimensional image matrix u in (11) into a vector by scanning the column one by one, we define the gradient operator ∇ in a matrix form, where I n is the n-dimensional identity matrix, ⊗ denotes the Kronecker product, and ∇ 1 is difference elementary matrix defined as: Then, let u be an image in R n 2 . The gradient of u can be computed as: In the proposed model (11), we use l q -norm as the nonconvex penalty function, i.e., P(x) = |x| q . Here, we only use the l q -norm penalty since (1) it has a flexible parametric form; (2) it's proximity operator corresponds to a thresholding function that is easy to compute in the practice; (3) the popular hard-and soft-thresholding is the special cases of our l q thresholding. By Theorem 6, the parameter ρ must be chosen large enough to guarantee the convergence conditions. However, the ADMM algorithm would be every slow and impractical if with a very large value of ρ . In this paper, we adopt the scheme in 26 to address this problem. Starting with a properly small value of ρ , we gradually increase the values of ρ in the iteration until reaching the target value, i.e., 0 < ρ 0 < ρ 1 < · · · < ρ k · · · . The stopping criterion for the proposed algorithm is that the relative-change between the restored images of the successive iterations is smaller than ε = 10 −3 . The parameter τ is set as τ = 0.9/ max (B T B) ; is manually tuned such that the restoration achieves the largest PSNR value.
The effectiveness of the proposed model. The first experiment aims to show the effectiveness of the proposed model and algorithm in image denoising application. The nonconvex regularization function is chosen as P(x) = |x| 1/2 whose corresponding proximity operator is computed by (9). Test images shown in the first column of Fig. 2 are two synthetic images and two real images with the size of 256 × 256 . The second column of Fig. 1 shows the corresponding noisy versions obtained by adding the salt and pepper noises with the density of 0.03 into the clean data. Here, Matlab built-in function imnoise are used to contaminate the images. The denoising results are shown in the last column of Fig. 1. From the results, we observe the following: (1) The proposed model is very effective for salt and pepper noises removing due to the use of the l 1 fidelity term. Almost all salt and pepper noises are removed in the restorations; (2)The image edges and contours can be preserved well by using nonconvex TV regularization. Next, we demonstrate the convergence property of the proposed algorithm by plotting two measures of the sequence u k conducted by Algorithm 1. Here, the test data are the images in the first experiment. Figure 3 shows the plots of the relative-change of the restorations versus iterations, where the relative-change of the restoration u in the iteration is computed by u k+1 − u k 2 / u k+1 2 . Figure 4 shows the plots of the energy E(u k ) computed by (11) versus iterations. From Fig. 4, we can see that the relative-change of u significantly decreases in the first few steps, and then converges to zero, which implies that u k+1 → u k as k → ∞ in the l 2 topology. And from Fig. 4, we observe that the energy E firstly decreases with the iteration progressing, and then converges to a constant, www.nature.com/scientificreports/ which implies that the limit point of the sequence u k is a local minimum point of the functional E(u). These two figures support the convergence analysis of the proposed algorithm in "The comparison experiment" section.
The test of different nonconvex parameter q. In this section, we test the l q -norm nonconvex penalty functions with different q-values in the interval of (0, 1). In the proposed algorithm, the v-subproblem is updated by proximity operator with l q -norm penalty. We note that when q = 1/2 and q = 2/3 , the corresponding proximity operators are l 1/2 and l 2/3 thresholding functions, which can be explicitly computed by (9) and (10), respectively. For any other values of q, the corresponding proximity operators are computed by (2.9), in which we need to solve a zero point y * of the non-linear function h(y) = qy q−1 + ρy − ρ|t| . In the numerical implementation, the zero point y * is solved by Newton method since h(y) is a convex function. Figure 5 shows the denoising results of the proposed model with different values of q, q ∈ {0.2, 0.5, 0.7, 0.9} , for two test images (Synthetic image A and Cameraman) with the size of 256 × 256 , where the the noisy images are obtained by adding the salt and pepper noises with the density of 0.03 into the clean data. We observe that, with these different nonconvex functions, the models all can remove the salt and pepper noises while preserving edges and contours in the restorations. However, the PSNR values listed in Table 1 show that in restoring the synthetic image, q = 0.2 yields the best performance, which is different from the results in restoring the Cameraman image, where q = 0.7 yields the best performance. In our opinion, this is due to the nature that real images are not strictly sparse as the synthetic sparse images in the TV domain.
The comparison experiment. In this subsection, we compare the proposed model with several state-ofthe-art models in denoising application.   Table 3. From the numerical results, we have the following conclusions: • l 1-norm fidelity term is more effective for salt and pepper noise and outlier removing than l 2 -norm. We observe that TVL2 and NTVL2 models perform well for Gaussian noise removing, but fail for salt and pepper noise. Some impulsive points are still remained in the restorations obtained by TVL2 and NTVL2 models. For TVL1 and NTVL1 models, however, it can be seen that these two models can remove the salt and pepper noise successfully. Almost all impulsive points are removed from the restorations by TVL1 and NTVL1 models. • Nonconvex regularization has the better performance in edges and contours preserving than convex ones.
Comparing the restorations of TVL2 and NTVL2 models, we observe that NTVL2 model keeps sharp features better than TVL2 model, even those impulsive points that are more prominent in the restorations by NTVL2 www.nature.com/scientificreports/ model. NTVL1 and TVL1 models can remove the impulsive points due to the use of the l 1 -norm fidelity. But obviously, the NTVL1 model significantly outperforms TVL1 model in preserving sharp contours and details. For example, in the face and camera of the cameraman, NTVL1 model restores more details and features than TVL1 model. In addition, Table 3 show that restorations by the NTVL1 model have slightly greater PSNR and MSSIM values than TVL1 model, which further demonstrates that nonconvex regularization has the better performance than convex regularization. • The proposed NTVL1 model has the largest PSNR and MSSIM values within these four models. It indicates that the combination of nonconvex regularization and l 1 -norm fidelity is promising in restoring the images contaminated by mixed Gaussian noise and salt and pepper noise.
Next, we apply the proposed model to several real images. Test images shown in Fig. 5 are "Pepper", "House", "Boats" and "Man" images with the size of 256 × 256 , which contain lots of edges, contours, details, textures, inhomogeneous regions and features of low contrast and so on. The noisy versions are obtained by adding the mixed Gaussian noise with σ = 0.01 and salt and pepper noise with d = 0.03 to the clean data. The PSNR and MSSIM values of the test noisy images are shown in Table 2. Again, we compare the proposed model with TVL2, NTVL2 and TVL1 models. Figure 10 shows the restoration results. One can clearly see that TVL2 and NTVL2 model can successfully remove Gaussian noise while preserving edges and contours. But they fail to www.nature.com/scientificreports/ remove salt and pepper noise. Some impulsive points are still remained in the restorations. TVL1 and NTVL1 can simultaneously remove Gaussian noise and salt and pepper noise while preserving the edges. But we can see that the proposed model preserves more image contours and details than TVL1. Table 3 and Figure 11 show that NTVL1 model has the largest PSNR and MSSIM values, which further demonstrates that our model has the best performance in restoring the images contaminated by mixed Gaussian and salt and pepper noise in these four models due to the use of the combination of nonconvex TV regularization and l 1 -norm fidelity.
The comparison with TGV, NLTV, NRL1, ASWMF and BM3D. In this experiment, we compare the proposed model with very famous total generalized variation (TGV) 10 , nonlocal total variation (NLTV) 13 , adaptive switching weighted median filter (ASWMF) 49 , nonconvex regularization model with l 1 -nrom fidelity (NRL1) 26 , and block-Matching and 3D filtering (BM3D) 50 . TGV is a high-order variation regularization model, which can  www.nature.com/scientificreports/ well restore piecewise smooth regions while preserving the edges. NLTV uses patch-distance rather than pointdistance to measure the nonlocal similarity of the image, which can better restore the image details than classic TV based models. ASWMF is based on median filting which can well remove the salt and pepper noise. NRL1 is a robust sparse recovery model with l 1 -norm fidelity, which can well restore the sparse image, or image with sparse representation on some basises. In the experiment, as in 26 , the sensing matrix A in NRL1 (8) is chosen as a partial discrete cosine transformation matrix. BM3D is a hybrid model, which combines block-Matching, 3D linear transform thresholding, and Wiener filtering. It is probably one of the best methods so far in image denoising application. We use three images ("House", "Boats" and "Cameraman") with the size of 256 × 256 as the test data for comparisons. All images are contaminated by mixed Gaussian noise with σ = 0.015 and salt and pepper noise with d = 0.04 . The restoration results are shown in Fig. 12. To save space, we here only show the result of House. We can see that TGV and NLTV can remove the Gaussian noise, but fail to remove salt and pepper noise. NLTV obtain the results with higher visual quality than TGV. ASWMF can well remove the salt and pepper noise, but blurs the edges. Our model and NRL1 can successfully remove the mixed noises, while well preserving the images edges and contours. BM3D has the best performance in terms of visual quality. The PSNR values are listed in Table 4 and Fig. 13. From the results, we note that the proposed model obtains the results with higher  www.nature.com/scientificreports/ PSNR values compared with TGV, NLTV, ASWMF and NRL1. BM3D has the largest PSNR values in these six models. Although BM3D works better than the proposed model, we think that the proposed model is still worthy of consideration since it needs lower computational complexity compared to BM3D, and outperforms other popular models.
The comparison in Set5 and Set13 datasets. In the last experiment, to further show the effectiveness and adaptability of the proposed model, we test the proposed model on Set5 and Set13 datasets 28 . The test images in these two datasets are contaminated by mixed Gaussian noise and salt and pepper noise. Again, we compare the proposed model with five TV based models: TVL2, NTVL2, TVL1, TGV and NLTV. The PSNR values of the results are shown in Tables 5 and 6. The second column in the tables is the noise level. The two numbers are the variance of the Gaussian noise and the density of the salt and pepper noise, respectively. And Fig. 14 shows the line chart of the average PSNR on the two datasets. From the results, we observe that the proposed model achieves the

Conclusions
This paper introduces a novel variational regularization model to restore images contaminated by salt and pepper noise. Different from the very famous TVL1 model, the proposed model uses a nonconvex total variation TV q (0 < q < 1) as the regularizer, which enables the model to be more effective for edge-preserving. A firstorder algorithm based on ADMM combining with MM scheme and proximity operator to solve this nonconvex minimization problem. In addition, a sufficient condition for the convergence of the proposed algorithm is provided. Numerical results demonstrate that the proposed model can effectively remove salt and pepper noise while preserving image edges and contours. Moreover, compared with TVL2, NTVL2, TVL1, TGV, NLTV, NRL1 and ASWMF models, the proposed model shows the best performance in terms of PSNR and MSSIM values. It yields about 0.5 dB PSNR and 0.06 MSSIM improvements against all compared models.
It should be point out that our nonconvex TV q regularization may lead to undesired artificial staircase in the restorations. In the future, we will focus on solving this problem by introducing some nonconvex high-order TV regularization. In addition, the ADMM algorithm used in this paper cannot guarantee to find the global optimum of the model. Therefor, another successive research is to combine some other algorithms, such as nature-inspired heuristic algorithms [51][52][53][54] , arithmetic optimization algorithms 55 .  www.nature.com/scientificreports/   Figure 13. The plots of the relative-change of the restorations versus iterations.  www.nature.com/scientificreports/

Data availability
The data that support the findings of this study are available on request from the corresponding author.